Data README file


This document explains how the files are organized to replicate the findings in 
�Every Little Bit Counts: The Impact of High-Speed Internet on the Transition to College� by Lisa Dettling, Sarena Goodman, and Jonathan Smith.

All required data and tools are in the "\Info for Replication" folder.  There exists SAS and Stata code in "\Info for Replication\Code" and the data are located in "\Info for Replication\Data".  


************************** Replication Steps ******************************************;

To replicate the analyses and table in the paper, you will need to do the following:

1) Copy the "\Info for Replication" folder into your desired directory.   
2) Import and process the College Board data using the SAS code "\Info for Replication\Code\Creating Analysis Dataset.sas"
3) Process the Zip Code data using "\Info for Replication\Code\Create Zip Code Data.do".
 	NOTE: this do-file calls upon several other do-files in "\Info for Replication\Code" including: "Creating Lagged 			
	Broadband Variables, minimize_rmse_avail_usage, read_access_data, read_cps_teenusage, read_pop_data, 
	read_soi_data, read_urates_hprices, read_usage_rates, read_zipcodes
4) Create new variables, run analyses, and reproduce tables using the Stata code "\Info for Replication\Code\Cleanig Data and Main Analyses.do".  
   



********************** Information on Data Sets Provided ***********************************;

The SAS and STATA code are annotated, so the definition of variables we create for the analyses will be clear. 

We provide the publicly available datasets for replication all located here: "\Info for Replication\Data\Raw Data\Other Data"

The following are publicly available data used in step 2: 

1) "ipeds_avg_sat_scores.sas7.bdat" - average of 25th and 75th percentiles of first-time freshmen by college and year.  Original data comes from IPEDS.
2) "ipeds_flagship.sas7.bdat" - indicator of whether college is flagship college.   Original data comes from IPEDS.
3) "ipeds_info.sas7.bdat - college characteristics by college and year, including state, level, control.    Original data comes from IPEDS.
4) "ipeds_liberal_arts.sas7.bdat" - indicator for whether college is a liberal arts college.   Original data comes from IPEDS.
3) "zipcodedownload.sas7.bdat" - zip code latitude, longitute, and state.  Original source is unknown.

The following are publicly available data used in step 3:

(1) FCC high speed internet data: labeled "int" (e.g., int121999.txt, int062000.txt, int12200.txt, etc.) or "hzip" (e.g., hzip0604.xls, hzip0605.xls, etc.)
(2) House price indices from FHFA: HPI_AT_metro.csv and HPI_AT_nonmetro.xls
(3) Unemployment rates and CPI from BLS: lau0004.dta, lau0509.dta, lau9599.dta, la_series.txt, la_area.txt; cpi.dta, cpi_qtr.dta, cpi2010.dta (raw LAU data files are large and can be obtained directly from BLS. code for processing raw data is included in county_urates_hprices.do)
(4) SOI income data from IRS: zpallagi_short_09242014.dta
(5) Census 2000 zip code information (size, population, population by age, median home values):  zipcodedata.txt, zcta5.txt, homevalues.dta, census_2000_sf1.dta (data files which create homevalues and census_2000_sf1 are large and can be downloaded directly from census, instructions and code for processing the raw data is included in read_pop_data.do and county_urate_hprices.do)
(6) High speed internet usage data from PEW: hsi_use_pew.dta
(7) Data from CPS school enrollment and broadband internet usage supplements: cps_extract_adults.dta; cps_2000.dta; cps_2001.dta; cps_2003.dta; cps_2007.dta; cps_2009.dta (instructions for processing raw CPS data included in read_cps_teenusage.do)
(8) Various crosswalks between geographies: state_codes.dta, state_zip.dta, zip_county.dta, all_geocodes_v2009.txt, List1.xls


We cannot provide the College Board proprietary datasets. Researchers can gain apply for access to these data for replication purposes by contacting the College Board here: 

The College Board
250 Vesey Street
New York, NY 10281
212-713-8088 

Or through the direct online portal here: http://research.collegeboard.org/data/request 

